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Abstract 


1 Introduction 


Many statistical methods for network data 
parameterize the edge-probability by at¬ 
tributing latent traits to the vertices such 
as block structure and assume exchangeabil¬ 
ity in the sense of the Aldous-Hoover rep¬ 
resentation theorem. Empirical studies of 
networks indicate that many real-world net¬ 
works have a power-law distribution of the 
vertices which in turn implies the number of 
edges scale slower than quadratically in the 
number of vertices. These assumptions are 
fundamentally irreconcilable as the Aldous- 
Hoover theorem implies quadratic scaling of 
the number of edges. Recently [Caron and| 
Fox 2014 proposed the use of a different 


notion of exchangeability due to [Kallenber^ 
2006 and obtained a network model which 


admits power-law behaviour while retaining 
desirable statistical properties, however this 
model does not capture latent vertex traits 
such as block-structure. In this work we re¬ 
introduce the use of block-structure for net¬ 
work models obeying Kallenberg’s notion of 
exchangeability and thereby obtain a model 
which admits the inference of block-structure 
and edge inhomogeneity. We derive a simple 
expression for the likelihood and an efficient 
sampling method. The obtained model is not 
significantly more difficult to implement than 
existing approaches to block-modelling and 
performs well on real network datasets. 


Preliminary work. Under review by AISTATS 2016. Do 
not distribute. 


Two phenomena are generally considered important 
for modelling complex network. The first is commu¬ 
nity or block structure where the vertices are parti¬ 
tioned into non-overlapping blocks (denoted by £ = 
1,..., AT in the following) and the probability two ver¬ 
tices i,j are connected depend on their assignment to 
blocks: 


P(Edge between vertex i and j) = 


where € [0,1] is a number only depending on the 
blocks i, m which i,j belongs to. Stochastic block mod¬ 


els (SBMs) were first proposed by White et al. 1976 


and today form the basic starting point for many im¬ 
portant link-prediction methods such as the infinite 
relational model 


Xu et al. 2006 Kemp et al., 2006 


While block-structure is important for link prediction 
the degree distribution of edges in complex networks 


has also attracted a deal of attention Barabasi and 

Albert 

1999 

Newman et al. 

2001 

Strogatz 

20011. In 


many large networks the degree distribution is often 
found to follow a power-law Newman] [2010 


P (Fraction of nodes with k edges) ~ fc (1) 


where 7 « 2.5. Explaining this scaling has lead to 
many important models of network growth where edges 
and vertices are typically added in a sequential fashion. 
A notably example is the preferential attachment (PA) 


model of Barabasi and Albert 1999 


Models such as the IRM and the PA model have dif¬ 
ferent goals. The PA model attempts to explain how 
network structure, such as the degree distribution, fol¬ 
lows from simple rules of network growth and is not 
suitable for link prediction. The IRMs main goal is to 
discover latent block-structure and predict edges, tasks 
which the PA model is not suitable for. In the follow¬ 
ing we will use network model to refer to a model with 
the same aims as the IRM, most notably predicting 
missing edges. 
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Figure 1: (Left:) A network is generated by randomly selecting points from [0, apC K+ (squares) and identifying 
the unique coordinates with a vertex (circles) giving the maximally disconnected graph. (Middle:) The edges 
are restricted to lie at the intersection of randomly generated gray lines at Oi each with a mass/sociability 
parameter wt such that the probability of selecting an intersection is proportional to WiWj giving a non-trivial 
network structure. (Right:) Using three independent measures J2i>i modulating the interacting with 

a parameter rjtrn > 0 allows block-structured networks where each random measure corresponds to a block. 


1.1 Exchangeability 


Invariances is an important theme in Baysian ap¬ 
proaches to network modelling [Kallenbe^ 2006 . For 
network data the criteria which has received most at¬ 
tention is infinite exchangeability of random arrays. 
Suppose we represent the network as a subset of an 
infinite matrix A = such that Aij is the num¬ 

ber of edges between vertex i and j (we will allow multi 
and self-edges in the following). Infinite exchangeabil¬ 
ity of the random array {Aij)ij>i is the requirement 
that [Hoover[ 1979[ Aldous, [l981| 


(^f7-(z)(T{j) )ij>l 


( 2 ) 


for all finite permutations a of N. The distribution of 
a finite network is then obtained by marginalization. 
An important consequence of infinite exchangeability 
is given by the Aldous-Hoover theorem |Hoover[|1979| 


Aldous 1981 according to which a random network 
obeying infinite exchangeability eqn. ([^ has a repre¬ 
sentation in terms of a random function and further¬ 
more the number of edges in the network must either 
scale as the square of the number of vertices or (with 
probability 1) be zero Orbanz and Roy 2015| . None 
of these options are compatible with a power-law de¬ 
gree distribution and one is faced with the dilemma 
of either giving up the power-law distribution or ex¬ 
changeability. It is the first horn of this dilemma which 
have been pursued by much work on Bayesian network 


modelling Orbanz and Roy 2015 


It is however possible to substitute the notation of in¬ 
finite exchangeability in the sense of e qn. ([^ with a 
different definition due to Kallenberg 2006 chapter 


9]. The new notion retains many important character¬ 
istics of the original including a powerful representa¬ 
tion theorem parallelling the Aldous-Hoover theorem 
but expressed in terms of a random set. Important 
progress has recently been made in exploring network 
models based on this representation by [Caron and Fox 


2014 who demonstrate the ability to model power-law 


behaviour of the degree distribution and construct an 
efficient sampler for parameter inference. The reader 
is encouraged to consult this reference for more details. 


In this paper, we will apply the ideas of Caron and] 
Fox 2014 to block-structured network data, thereby 


obtaining a model based on the same structural in¬ 
variance but able to capture both block-structure and 
degree heterogeneity. 

The remainder of the paper is structured as follows. 
In section |2.1| and |2.2| we will provide a basic descrip¬ 
tion of the construction of Caron and Fox [2014| and a 
sketch of our proposed model, dubbed the completely 
random measure stochastic block model (CRMSBM). 
Section [2.31 contains a brief review of random measure 
theory which will be used to define the proposed model 
more explicitly. In section [2.5| we will introduce a sim¬ 
ple sampling scheme and section will investigate the 
model on 11 network datasets. 


2 Methods 

Before introducing the full method we will describe 
the construction informally omitting details relating 
to completely random measures. 
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2.1 A simple approach to sparse networks 

Suppose the vertices in the graph are labelled by real 
numbers in M+. An edge e (edges are considered di¬ 
rected and we allow for self-edges) then consists of two 
numbers {xei,Xe 2 ) & K+ denoted the edge endpoint 
and a network X oi L edges (possibly L = oo) is sim¬ 
ply the collection of points X = {{xei,Xe 2 ))e=i C K+. 
We adopt the convention that multi-edges implies du¬ 
plicates in the list of edges. 

Suppose X is generated by a Poisson process with base 
measure ^ on 


X~PP(e). (3) 


a finite network Xa can then be obtained by consid¬ 
ering the restriction of X to [0, X^ = A fl [0, a[^. 
This representation easily admits sparse networks as 
illustrated in the left pane of figure Suppose ^ 
is the Borel measure, the number of edges is then 
L ^ Poisson(a^) and the edge-endpoints Xei,Xe 2 are 
i.i.d. on [0,a[. The edges are indicated by the gray 
squares in figure [Ta] and the vertices as circles. Notice 
the vertices will be distinct with probability 1 and the 
procedure therefore gives rise to the de-generated but 
sparse network of 2L vertices and L edges shown in 
figure 


To generate non-trivial networks the edge-endpoints 
must co-inside with nonzero probability. Similar to 


Caron and Fox 2014 , suppose the coordinates are re¬ 


stricted to only take a countable number of potential 
values, 01 , 02 ,’■■ G H^-i- and each value has an asso¬ 
ciated sociability (or mass) parameter wi,W 2 ,--- G 
[0,oo[ (we use the shorthand (0^)^ = (0i)“i for se¬ 
ries). If we define the measure /i = 
let ^ = /r X /r, then generating Xa according to the 
procedure of eqn. ^ the position of the edges is still 
selected i.i.d., but with probability proportional to 
WiWj of selecting coordinate {0i,9j). Since the edge- 
endpoints coincide with non-zero probability this pro¬ 
cedure allows the generation of non-trivial associative 
network structure, see figure |lb[ With proper choice 
of {wi,6i)i>i these networks exhibit many desirable 
properties such as a power-law degree distribution and 
sparsity [Caron and Fox 2014 . 


This process can be intuitively extended to block- 
structured networks as illustrated in figure Each 
vertice is assigned a sociability parameter (here indi¬ 
cated by the colors and the symbol Zi G {1,..., AT}) in¬ 
dicating the assignment of vertex i to one of K blocks. 
We can then consider a measure of the form 


K 


C = X! VziZjWiWj6(^0.^g.) = ^ mmlJ-e X (4) 

i,j>l l,Tn—l 


Where we have introduced p,£ = J2i-z ^ 

then a measure on [0, q;[^ and rjgm parameterize the 
interaction strength between community £ and m. No¬ 
tice the number of edges Lim between block I and m is, 
by basic properties of the Poisson process, distributed 
as ^ Foisson{r]trnTiTm) where Ti = ^f([0,a[) and 
in figure the location 0^ of the vertices have been 
artificially ordered according to color for easy visual¬ 
ization. The following section will show the connec¬ 
tion between the above constructio n of eq. (|4|) a nd the 
exchangeable representation due to Kallenberg [2006 


however for greater generality we will let the latent 
trait be a general continuous parameter Ui € [0,1] and 
later show block-structured models can be obtained as 
a special case. 


2.2 Exchangeability and point-process 
network models 


Since the networks in the point-set representation are 
determined by the properties of the measure invari¬ 
ance (i.e. exchangeability) of random point-set net¬ 
works is defined as invariance of this random mea¬ 
sure. Recall infinite exchangeability for infinite ma¬ 
trices eqn. (|^ was the requirement the distribution of 
the random matrix did not change by permutation of 
rows/columns in the network. For a random measure 
on the corresponding requirement is that it should 
be possibly to partition IR+ into intervals Ii, I 2 , 13 ,..., 
permute the intervals, and the random measure should 
be invariant to this permutation. Formally, a random 
measure ^ on is then said to be jointly exchangeable 

if^o((^(g)(^)“^ = ^ for all measure-preserving transfor¬ 
mation ip of R+. According to Kallenberg 2006, the¬ 


orem 9.24] this is ensured provided the measure has a 
representation of the form: 




(5) 


where is a measurable function, C is a random 
variable and {{xi,9i)}i>i is a unit rate Poisson pro¬ 
cess o n R^ (the converse involves five additional 
terms [Kallenberg [2006[ ). In this representation, the 
locations {9i)i and the parameters (xi)i are de-coupled, 
however we are free to select the random parameters 
(xi)i>i to lie in a more general space than R+. Specif¬ 
ically, define 


Xi = {ui,Vi) G [0,1] X R+ 

with the interpretation that each vt corresponds to a 
random mass Wi through a transformation Wi = g{vi) 
and each Ui G [0,1] is a general latent trait of the 
vertex (In figure]^ this parameter corresponded to the 
assignment to blocks). We then consider the following 
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choice: 


h{C,Xi,Xj) = (vj) 


( 6 ) 


where / : [ 0 , 1 ]^ —)■ K+ is a measurable function (com¬ 
pare to the graphon in the Aldous-Hoover representa¬ 
tion) and {{ui,Vi,6i)}i>i follows a unit-rate Poisson 
process on [ 0 , 1 ] x 

To see the connection with the block-structured model, 
suppose the function / as the piece-wise constant func¬ 
tion 

K 

f{u,u')= ^ gernlji{u)lj^{u') 


£,m—l 


where 


Je = 


e-i 




m—1 


m—1 


K 




e=i 


and I3e > 0 and Zi = I denotes the event lj^{ui) = 1 . 
Notice this choice for / is exactly equivalent to the 
graphon for the block-structured network model in the 


Aldous-Hoover representation Orbanz and Roy 2015 
The procedure is illustrated in figure 

Realizations of this point-process is shown in figure 
for various values of K (top-row) and the choice of ran¬ 


dom measure introduced in section 2.3 and using the 
simulation method of Caron and Fox 2014 . For com¬ 


parison the bottom row indicates the corresponding 
standard stochastic block-model representation where 
the edges are distributed uniformly within each tile. 
Notice the K = 1 ,7711 = 1 case corresponds to the 


method of Caron and Fox 2014 


To fully define the method we must first introduce the 
relevant prior for the measure g = X]i>i As 

a prior we will use the Generalized Gamma-process 
(GGP) [Hougaard 1986 . In the following section we 


will briefly review properties of completely random 
measures and use these to derive a simple expression 
of the posterior. 


2.3 Random measures 


As a prior for g, we will use completely random mea¬ 


sures (GRMs) and the reader is referred to Kallen- 
berg[ 2006 Kingman, 1967 for a comprehensive ac¬ 
count. Recall first the definition of a CRM. Assume 
S is a separable complete metric space with the Borel 
cr-field H(S) (comparing to the preceding discussion 
S = [0, a[). A random measure /i is a random variable 
whose values are random measure on S. For each mea¬ 
surable set A G H(S), the random measure induces a 
random variable g{A), and the random measure g will 
be said to be completely random if for any finite collec¬ 
tion Ai,..., An of disjoint measurable sets the random 


K = l K = 2 K ='6 A=4 


- . . '.E 

■ -s 

•y. 


: A,jE 

A ' *'4 


A'-. 

' A gAAsIxA) 

'■■msi 




fc = 188 = 537 k = 689 k = 1961 


Figure 3: (Top row:) Example of four randomly gen¬ 
erated graphs for K = 1,2,3 and 4. The other pa¬ 
rameters were fixed at a = 20, r = 1 ,(t = 0.5 and 
Aa = Ab = 1. Vertices has been sorted according to 
their assignment to blocks and sociability parameters. 
(Bottom row:) The same networks as above but ap¬ 
plying a random permutation to the edges within each 
tile. A standard SBM assumes a network structure of 
this form. 


variables g{Ai),..., g{An) are independent. It was 
shown by Kingman [1967| any such random measure 
g is discrete almost certainly with a representation 




(7) 


where the sequence of masses and locations {wi,9i)i 
(also known as the atoms) is a Poisson random mea¬ 
sure on K'*" X S with mean measure v. The mean 
measure v thus fully characterizes the CRM and is 
known as the Levy intensity measure. We will consider 
homogeneous CRMs where locations are independent, 
v{dw,d9) = p(dw)Ka[d9), and assume Kq, is the stan¬ 
dard Borel measure on [0, a[. 

Since the construction as outlined in hgure [T^ depends 
on sampling the edge start and end-points at random 
from the locations {9i)i with probability proportional 
to Wi of particular interest will be the normalized form 
of eqn. 0. Specifically, the chance of selecting a par¬ 
ticular location from a random draw is governed by 

00 00 

i=l 


which is known as the normalized random measure 
(NRM) and T is known as the total mass of the CRM 
g. A random draw from a Poisson process based on 
the CRM can thus be realized by first sampling the 
number of generated points, L ~ Poisson(r), and then 
drawing their locations iid. from the NRM of eqn. (|^. 

Notice for this definition to make sense the Levy in¬ 
tensity must satisfy the requirements: p{dw) = 00 
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Po,- 


Step 1; Generate candidate vertices 


Step 2: Select graphon 



Vzi 

V-A2 

Vzz 

V21 

V22 

V2A 

Vii 

Vl2 

Via 




Pi 


/3i ft 1 

>1 ~ CRM(p^^^,R+ X [0, ll)(ft)^j ~ Dirichlet(!^^, ■ ■ ■ 


■ is the Levy intensity of a GGP 


Vim ~ Ganinia(Aa, Ab) 


Step 3: Form measure Step 4: Sample network 





Figure 2: (Step 1:) The potential vertex locations, 9i, latent traits Ui and sociability parameters Wi are generated 
using a generalized gamma process (Step 2:) The interaction of the latent traits / : [0,1]^ —>■ K+, the graphon, is 
chosen to be a piece-wise constant function (Step 3-4:) Together, these determine the random measure ^ which 
is used to generate the network from a Poisson process 


and /g°°(l — exp{—w))p{dw) < oo, guaranteeing re¬ 
spectively that T is positive and finite, thereby en¬ 
suring eqnjsl is well-defined. The reader is referred 


to James 2002 for a comprehensive treatment. For 


many important Levy intensities the density of the to¬ 
tal mass does not have a simple analytical expression, 
however the Laplace transform of the random variable 
S corresponding to the total mass T has the general 


form James, 2002 


C[S]{u) = -^(u) = k{S) j p{dw){l - 6"“’"). (9) 

Consider a sequence of random draws made from a 
NRM of the form eqn. Q. Since the NRM is discrete 
almost surely there is a positive probability any two 
draws will select the same atom 9i. This induces a 
random partition U of N known as the exchangeable 
partition probability function where i, j G N are in the 
same block iff. they select the same atom. This parti¬ 
tion is exchangeable and it’s properties can be derived 


from the choice of Levy intensity, see Kingman 1978 


Pitman 2006 for additional details. 


In the following we will focus exclusively on the gen¬ 
eralized gamma process (GGP) as the choice of inten¬ 
sity measure James 2002| . The GGP is parameterized 
with two parameters a, r and has the functional form 


Pa,T{dw) = 


,—1—a^—TW 


r(l-a) 


dw 


The parameters (tr, r) may lie in either the regions ] — 
00, 0] x]0, oo] or jo, l[x [0, oo[. We will focus exclusively 
on the later region as the first corresponds to a finite 
number of atoms. Notice this is the same choice as in 


Caron and Fox 2014 . In conjunction with Kc we thus 


obtain three parameters (a, a, r) which fully describes 
the CRM and induced partition structure. To signify 
this dependence we will use Pa,a,T for the CRM p. For 
this particular choice the eqn. becomes [Pitmanj 


2006 


a ,T ipd) — Pa,<7,ri^dw^i^^ C 


= -[(u + r)'^-rn 


( 10 ) 


For T = 0, a = 1 this is the Laplace transform of a a- 


stable random variable Sa Pitman 2006, Devroye and 


James 2014 which according to Zolotarev’s integral 


representation has the following form Zolotarev, 1964 
fa{x) = 

A{a, u) = 


7r(l - <y) Jo 
sin((l - cr)M)i-'" 


1 

sin(CTM)'’' 


sin(u) 


( 11 ) 


Thus introducing a new random variable Ta,a,T with 
density 


9a,a,rit)=9 -fa{t9 -)f>xit9 -) 


( 12 ) 


where (j)\{t) = e^°~^*', X = , 9 = ^. Standard 

properties of the Laplace transform reveals C[ga,( 7 ,T\ = 
J/ as defined in eqn. (101. Thus, eqn. (TT^ is the den¬ 


sity of total mass T for the GGP. Additional details, 
including efficient sampling methods for this distribu¬ 


tion, is discussed by Devroye and James 2014 


With the notation in place we provide the final form of 
the generative process. Suppose the random measure 
p (restricted to the region [0,a[) have been generated 
from a GGP. Assume Zi = I iff. Ui G Ji and define the 
K sub-measures on [0,a[: 


Pi 






each with total mass = pi{[0,a[). The number 
of points in each tile Lim is then Poissoii{r]imTiTm) 
distributed, and given Lim the edge-endpoints 
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(xei{, Xe 2 m) between atoms in measure £ and m can 
then be drawn from the corresponding NRM. The gen¬ 
erative process is then simply: 

(/3i)f=i ~ Dirichlet(^, ■ ■ •, 

CRM(p,,,,t/[o,i] X UmJ 
Vik Gamma(Aa, Ah) 


for e — 1, . . . , L^rn • 

and Xe2m 


~ Y’o\sson{'qirnTiTm) 
Categorical({’"i/Tf)2,=^) 

Categorical(’"V'r™)23=m) 


For sampling random graphs we used the same adap¬ 
tive thinning strategy as Caron and Fox 2014^ 


2.4 Posterior Distribution 


In order to define a sampling procedure of the 
CRMSBM we must first characterize the posterior dis¬ 
tribution. In [Caron and Fox 2014 this was calculated 
using a specially tailored version of Palms formula. In 
this work we will use a counting argument inspired by 


Pitman 2003 eqn. (32)]. 


To this end, consider first the case where the inter¬ 
action strengths {r](,m)im and block sizes (f3i)e. has a 
fixed value and that number of edges within each 
tile {£, m) is given. Since not all potential vertices 
(i.e. terms WiSg. in /r) will have edges attached to 
them it is useful to introduce a variable which encapsu¬ 
lates this distinction. We therefore define the variable 
Zi = 0,1,..., K with the definition: 


J Zi if there exist {x,y) G st. di G {x,y}, 
\ 0 otherwise. 


Suppose in addition for each measure yn, the end¬ 
points of the edges associated with this measure selects 


• The mass parameter Tq,^ is distributed as gae,a,T 

• For each £ = 1,... ,K, there must be a Poisson 
atom in dwi for each i such that Zi = £ 

• For each £, we know there are Poisson atoms in 
{dwi)z.=g, however since the measure of these in¬ 
tervals is infinitesimal, the remaining mass Tg — 

distributed as gai,a,T- 

• Each edge-endpoint selects the atom indepen¬ 
dently with probability given by the NRM of 
eqn. 0, Wi/Tg. 


The probability eqn. can then be obtained from 
these three contributions as (with k = being 

the total number of vertices in the network): 


W^apa,T{dwi) > 

,i=l J 

W\gai,a,T{Tg- ^ 


W, 


.=g 


(14) 



(15) 


where rii is the total number of times a particular atom 
i of gg is selected in the process. To connect these defi¬ 
nitions to actual network data, i.e. an array (A^ 
notice if the atom {wi,9i) corresponds to a particular 
vertex i in the network then Ui = 


Returning to eqn. (15) for a particular £, the expres¬ 
sion can be integrated by introducing the variables 
sg = J2i-z =i corresponding to the sum of the se¬ 
lected atoms, introducing the parameters Xi = Wij 


and integrating Pitman 

2003 

Lijoi and Walker, 2008 

Favaro et al. 

2013 . With ng 

= J2i-.i.=g'^i eqn. (15) 


can be written as a 



np—kfO'—l /rri \ 

r{ng — kga)T^^ 


(16) 


= \{i-.'Zi=£}\ 


unique atoms and that the number of edge-endpoints 
selecting any particular atom Wi is This naturally 
divides the edge-endpoints associated with a particu¬ 
lar measure £ into a partition, {Bg,..., Bk^} [Pitman 


2003 , and we denote by Ilg 2 L this random partition 


for measure £. For a particular measure the joint dis¬ 
tribution 


P(ng, 2 L = {Bi,...,Bki},w^ G dwi,Tg G dTg) (13) 
is obtained from three contributions (with ag = (dgo)'. 

'^We thank the authors for generously making their code 
available online. 


Recall the number of edges within each tile Lgm is 
Poisson with rate ggmTgTm. In addition, when consid¬ 
ering a concrete observed data matrix the edges does 
not have a particular labelling which is otherwise in¬ 
troduced in the proceeding counting argument. Thus, 
if we observe a number of edges between vertices 
i,j in a particular tile, we must consider all ways a 
network with this number of edges can be obtained by 
our generative process. This is equivalent to the num¬ 
ber of ways of selecting the particular edge-counts of 
the total edge-counts within each tile. The multiplicity 
becomes the multinomial coefficient: 


Pgn 


Pgm^- 


{^ii)z^=g,z.=mj T\.i=l,z.=m^ir 


( 17 ) 
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The probability of obtaining a particular observed net¬ 
work Aij can be obtained by combining eqs. 0 
and the Poisson rates for the edge-counts within each 
tile to obtain: 

-P(A, {Zi)i\{r]£m)inn — iur dT(, {eqn. @}| 

X U < VoiSSOii{Le,\r]imTiTm) Lim- ^ ^ 

Im 


Defining rUm = J2z.=i,z^=j a^d simplifying 


P{A,{zi)i\{r]im),{^e)e) = 


11 ’'? 

.im 


VlmTlT, 



dTidst 


where we have defined 


ke nt — kta—l 


Ei = 


a '^s 


e 


r(nf — k£a)e'^^‘ 


9at 


,r,a{Ti-Se) (1 - Cr)ni- 


Similar to Lijoi and Walker 2008 we will use the sim¬ 


ple change-of-variable from T to t = T—s and a change 
in the order of integration to obtain: 


/b+ Jo 


dTdsh{T,s)= [[ dsdt h{t + s, s). (19) 


Then introducing the Gamma-priors for r]im, Dirichlet 
prior for {j3i)i and integrating over rjim we obtain the 
final expression: 


P{A, a, T, {ae, se, ti)i) = 
YliEi 


r(/ 3 o)nf=i«? ' 


Hi , ^ir 


n 


r(^)^a/3o 

G{\a+ntnn ^b +isi+t£){Sm.+tm)) 


^3 ^3 ^rjryi 


G{\a,\b) 


where G{a,b) = r{a)b “ is the normalization factor 
of the Gamma distribution. Finally notice the 77 = 1 


case, corresponding to Caron and Fox 2014 , can be 


obtained by taking the limit Aq = A;, —> 00 in which 
case —t e~^- When discussing the K = 

1 case we will assume this limit has been taken. 


must be updated and finally the remaining variables 
(ai, si, ti) associated with each expression Ei must be 
updated. We will first consider the later problem: 

Update of variables associated with each E^: 

All terms except the densities ga,iy,T are amenable to 
standard sampling techniques. In [Caron and Fox 


2014 this expression was sampled by employing a pro¬ 


posal distribution proportional to the density, thus al¬ 
lowing their value to cancel. In our work we opted 
for the approach of Lomelf et al. 2014] in which u 
in Zolotarev’s integral representation eqn. 0 is con¬ 
sidered an auxiliary parameter. Thus, introducing 
ug G] 0 , 7 r[ for each RPM gives the full set of variables 
= {at,S£At)Ut) for each RPM. For convenience, 
the domain of the variables are in turn transformed to 
K using the standard change-of-variables x for 

a, t and s and the logistic mappings x H> (1 -I- , 

X I— )■ 7 r(l -I- for a and u. We found a sim¬ 

ple random-walk Metropolis-Hastings sampling with a 
A/’( 0,(7 = 0.1) kernel (50 steps per iteration) was ro¬ 
bust and efficient compared to the other updates. 

Update of Zi'. These variables can be updated di¬ 


rectly from the likelihood eqn. ( 201 , however we opted 


to re-impute the weights {wi)z.=i by inverting the in¬ 
tegration step from eqn. 0 to eqn. ( |l 6 | to obtain 

{wi/si,)i.,z.=i Dirichlet ((n^ - (T),,z.=i) (21) 

doing this for each £ = 1,..., K allows all variables Zi 
to be updated in a regular Gibbs sweep. 

Update of Most networks are binary whereas 

the model assumes count-data. Furthermore to test 
the model it is useful to predict the presence of unob¬ 
served edges. Both of these difficulties are resolved 
by imputation. Suppose we are given a matrix W 
such that Wij = 1 iff. 


the edge-count is unob¬ 
served. Furthermore assume Ay is binary and must 
be imputed. Edges can then in principle be imputed 
(20) directly by performing MCMC updates of Ay and ac¬ 


cepting/rejecting according to the likelihood eqn. ( 201 , 


however the coupling between different counts through 
the gamma functions in Ei would make such a sam¬ 
pling procedure prohibitively expensive. This diffi¬ 
culty is not present in Caron and Fox 2014 where 


the sociability-vector {wi)i are retained and updates 
using Hamiltonian Monte-Carlo, however we can re¬ 
sample {wi)i and {r]irn)im from their marginal distribu¬ 
tions and use the re-sampled values of (wi)i to impute 
the corresponding values of (Ay). Thus for each plate 
Sampling the expression eqn. (20l requires three types (^,m) we sample {wi)z.=i from (21) and rjim from 


2.5 Inference 


of sampling updates: For Ay we must apply a sam¬ 
pling procedure to impute missing values, the sequence 
of block-assignments {zi)i must be updated, the pa¬ 
rameters associated with the random measure a, r 


i]£n 


Gamma(n£m + Aa, (t^ + Sf)(tm + Sm) + Ab) (22) 

the distribution of each unobserved Ay- is then simply 
Poisson(77^Wi?i'j), Zi = £, zj = m. 






































Manuscript under review by AISTATS 2016 



Lag 

Figure 4: Autocorrelation plots for lags up to 1000 
of the parameters a, cr, r (top row) and s, t, u (bottom 
row) for a AT = 1 network drawn from the prior dis¬ 
tribution using a = 25, cr = 0.5 and r = 2. The plots 
were obtained by evaluating the proposed sampling 
procedure for 10® iterations and the shaded region in¬ 
dicates standard deviation obtained over 12 re-runs. 
The simulation indicates reasonable mixing for all pa¬ 
rameters with u being the most affected by excursions. 


2.5.1 Full inference procedure 



Figure 5: AUC score on held-out edges for the se¬ 
lected methods (averaged over 4 restarts) on 11 net¬ 
work datasets. For the same number of blocks, 
the CRMSBM offers good link-prediction performance 
compared to the method of Caron and Fox [2014 


(CRM), a SBM with Poisson observations (plRM) and 
the degree-corrected SBM (DCSBM) [Herlau et al. 


2014). Additional information is found in the supple¬ 


mentary material. 


Aij. Basic validation of the sampling procedure can 
be found in the supplementary material. 


Collecting the preceding steps we obtain 


3 Experiments 


• For £ = 1,..., K, update the four variables in 4)^ 
and cr, r using random-walk metropolis bastings 


Impute using eqs. (22) once for each £ 


and then iterate over i and update each Zi using 
a Gibbs sweep from the likelihood. 


Impute {r]ijn)em and {wi)i using eqs. (22) and (21) 
and for each (ij) such that the edge is either un¬ 
observed {Wij = 1) or must be imputed (A^- > 
1 ) generate a candidate a ^ Poisson{r]imWiWj). 
Then, if Wij = 1 simply set Aij = a, otherwise if 
Wij = 0 and a = 0 reject the update. 


The parameters and cr, t are important for deter¬ 
mining the sparsity and power-law properties of the 


network model Caron and Fox 2014 and to investi¬ 


gate the sampling of these parameters we generated a 
single network problem using a = 25, cr = 0.5, r = 2 
and evaluated 12 samplers with AT = 1 on the prob¬ 
lem. Autocorrelation plots (mean and standard devia¬ 
tion computed over 12 restarts) can be seen in figure]^ 
All parameters mix, however the different parameters 
have different mixing times with in particular u be¬ 
ing affected by excursions. This indicates many slice¬ 
sampling update of 4)^ are required to explore the state 
space appreciably and we therefore applied 150 slice¬ 
sampling updates of for each update of {zi)i and 


The proposed method was evaluated on 11 network 
datasets (a description of how the datasets were ob¬ 
tained and prepared can be found in the supplemen¬ 
tary material). As a criteria of evaluation we decided 
for AUC score on held-out edges, i.e. predicting the 
presence or absence of unobserved edges using the im¬ 
putation method described in the previous section. All 
networks were initially processed by thresholds at 0 
and vertices with zero edges were removed. A fraction 
of 5% of the edges were removed and considered as 
held-out data. 


To examine the effect of using blocks, we compared the 
method against the method of Caron and Fox [2014] 
(CRM) (corresponding to = 1 and AT = 1), a 
standard block-structured model with Poisson obser¬ 
vations (pIRM) [Kemp et al. 2006 and the degree- 


corrected stochastic block model (DCSBM) Herlau 
et al. ] |m4] , a simple model which allows both block- 


structure and degree-heterogeneity but is not ex¬ 
changeable. More details on the simulations and meth¬ 
ods are found in the supplementary material. 


The pIRM was selected since it is the closest block- 
structured model to the CRMSBM without degree- 
correction. This allows us to determine the relative 
benefit of inferring the degree-distribution compared 
to only the block-structure. 
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For priors we selected uniform priors for cr, r, a and 
a Gamma(2,1) prior for /3o,Ao,A{,. For consistency 
similar choices were made for the alternative models. 


All methods were evaluated for T = 2 000 iterations 
and the later half of the chains was used for link 
prediction. We used 4 random selections of held-out 
edges per network to obtain the results seen in figure 
(same sets of held-out edges were used for all meth¬ 
ods). It is evident the use of block-structure is cru¬ 
cial to obtain good link prediction performance. For 
the block-structured methods, the results indicate ad¬ 
ditional benefits from using models which allows the 
modelling of degree-heterogenity for most networks ex¬ 
cept the Hagmann brain connectivity graph. This re¬ 
sult is possibly explained by the Hagmann graph being 
constructed by fixing the number of outgoing edges 
for each vertex. Comparing the CRMSBM and the 
DCSBM, these models perform either on par or with 
a slight advantage to the CRMSBM. Models of net¬ 
works based on the CRM representation of|Kallenberg| 


2006 offers one of the most important new ideas in 


statistical modelling of networks in recent years and to 
our knowledge Caron and Fox 2014 were the first to 
realize the benefits of this modelling approach as well 
as describing it’s statistical properties and provide an 
efficient sampling procedure. 


overlapping blocks, possibly using the correlated ran¬ 
dom measures of Chen et al. 2013 , appears fairly 
straight-forward and should potentially offer a gener¬ 
alization of overlapping block models. 
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The degree distribution of a network is only one of sev¬ 
eral important characteristics of a complex network. 
In this work we have examined how the ideas presented 


Caron and Fox 2014 can be applied for a sim¬ 


ple block-structured network model to obtain a model 
which admits block structure and degree correction. 
Our approach is a fairly straightforward generalization 
of the methods of Caron and Fox 2014 . However, we 
have opted to explicitly represent the density of the to¬ 
tal mass gae,cr,T using Zolotarev’s integral representa¬ 
tion and integrate out the sociability parameters {wi)i 
reducing the number of parameters associated with the 
CRM from the order of vertices to the order of blocks. 


The resulting model has the increased flexibility of be¬ 
ing able to control the degree distribution within each 
block. In practice results of the model on 11 real-world 
datasets indicates this flexibility offers benefits over 
a purely block-structured approaches to link predic¬ 
tion for most networks as well as some potential ben¬ 
efit over an alternative approach to modelling block- 
structure and degree-heterogeneity. The results heav¬ 
ily indicates structural assumptions (such as block- 
structure) is important to obtain reasonable link pre¬ 
diction. 


Block-structured network modelling is in turn the sim¬ 
plest structural assumption for block-modelling. The 
extension of the method of Caron and Fox 2014 to 
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Figure 6: The estimated frequency of all unique net¬ 
works binned according to their unique edge-endpoint 
counts (ni,...,nfc). (ordered decreasingly) for L = 
0,..., 4 edges (41 in total, red circles), as well as the 
frequency obtained by computing the probability (see 
text for details). 



w 


Figure 7: The density of the stick length '^^Wi for 
the randomly generated networks as well as the true 
density eqn. (241 obtained by numerical integration of 
eqn. (231 


A Supplementary Material 


Validation of the sampler 


To investigate the validity of the sampling procedure, 
we considered the iF = 1, Aa = Ah —>■ oo case and used 
the sampling procedure of [Caron and Fox 2014 with 
(a = 2,(7 = 0.5, r = 1) to generate 250 000 random 
networks. As described in the previous section the 
probability of any given network is fully determined by 
the edge-endpoint counts (ni,..., Uk) and the proba¬ 
bility of a particular sequence of counts is permuta¬ 
tion invariant. If ordered decreasingly this gives 41 
unique vectors of edge-endpoint counts (ni,...,n/c) 
for L = 0,1,2, 3,4 (see vertical axis on figure]^ and 
the generated networks were binned according to their 
edge-endpoint count signature (networks with more 
than 4 edges were discarded). In this manner we ob¬ 
tained an estimate of the true frequency of a particular 
network signature. This estimate of the frequency was 
compared against the probability of a given network 
as computed by eqn. (17). Notice that due to per¬ 
mutation invariance the probability of each network 
signature must be corrected by multiplying eqn. (17) 
with a factor obtained by a combinatorial argument 


(see for instance Pitman 2006, eqn. (2.2)]) 


n! 


(*!)-* md’ 


where rrii = l{ni = 1). 


i=l 


We thus obtain two estimates of the probability of a 
particular network signature shown in figure both 
in close agreement. In figure]^ is shown the estimated 
density of the total mass T obtained by numerically 


integrating Zolotarev’s integral representation of fa 


fa{x) = 


A(a, u) = 


r(l-(T) 




sin((l — (7)u)^ '^sin((Tu) 


sin(zi) 

9a,a,r{t) = 0 ~ - f a {t9~ ^ ) (j) x{t9~ ^ ) 


(23) 

(24) 


and the estimated density of the total mass T obtained 
by summing the generated sticks {wi). Both the esti¬ 
mates of the networks signatures and the density of T 
are in close agreement. 


Datasets and preparation To test the methods 
we selected 11 publicly available datasets describing 
social networks, co-authorship networks and biological 
networks. 


Yeast: Interaction network of 2361 proteins in 
yeast 


Bu et al. 2003 


SmaGri: Coauthorship network of 1059 authors 
from the Garfield’s collection of citation net¬ 


works Batagelj and Mrvar 2014 


SciMet: Coauthorship network of 3084 authors from 


the Scientometrics journal, 1978-2000 Batagelj 


and Mrvar 2014 . 


Netscience: Coauthorship network of 1589 authors 
working in network theory as compiled by |New-| 
manl [2006]. 


Hagman: Structural brain networks where edges cor¬ 
respond to the number of fiber tracts between 998 
brain regions. All five networks in the dataset 
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were simply averaged to produce a single net¬ 


work Hagmann et al. 2008 


NIPS: Consisting of the 2865 authors who have coau¬ 
thored papers together at the 1-12’th NIPS con¬ 


ference Roweis 2009 . 


Caltech, Simmons, Reed, Haverford, Swarthmore: 

Five social networks of 769,1446,962,1518,1659 
students respectively obtained from the Face- 


booklOO dataset Traud et al. 2011 . 


The datasets were processes similarly by first remov¬ 
ing any vertices without edges, i.e. where = 0 , and 
thresholding at 0 to produce binary networks. Selec¬ 
tion of the missing edges for link prediction was done 
by first removing a fraction of 5% of all potential edges 
at random and then, if this procedure left any ver¬ 
tices without attached edges, re-introducing one of the 
edges attached to each such vertex and removing (at 
random from all other potential edges) a single edge. 
This procedure was repeated until 5% of the potential 
edges were removed and all vertices had at least one 
edge attached. 


Models considered In addition to non-parametric 
extensions of the Poisson SBM we compared the 
CRMSBM against a degree corrected block model, the 
degree-corrected stochastic block model (DCSBM) of 


Herlau et al. 2014 . This model is not exchangeable 


but does model block structure and sociability. 


Specifically the DCSBM assumes a generative process 
of the form: 


(zi, CRP(a) 

mm ~ Gamma(Aa, Xb) 

Dirichlet(( 7 )f^J 


To be consistent with the CRMSBM we selected a 
prior of the form Gamma(2,1) for a, Aq and Xb- The 
model is somewhat sensitive to the choice of prior for 
7 however we found a prior of the form Gamma(2,1) 
to perform reasonably well. The DCSBM reduces to 
a model wit hout degree-correction, the pIRM [Kemp 


et al. 


2006 


, by the choice 7 ^^ = 
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